Prediction of intraoperative cerebrospinal fluid leaks in endoscopic endonasal transsphenoidal pituitary surgery based on a deep neural network model trained with MRI images: a pilot study

Objective This study aimed to investigate the reliability of a deep neural network (DNN) model trained only on contrast-enhanced T1 (T1CE) images for predicting intraoperative cerebrospinal fluid (ioCSF) leaks in endoscopic transsphenoidal surgery (EETS). Methods 396 pituitary adenoma (PA) cases were reviewed, only primary PAs with Hardy suprasellar Stages A, B, and C were included in this study. The T1CE images of these patients were collected, and sagittal and coronal T1CE slices were selected for training the DNN model. The model performance was evaluated and tested, and its interpretability was explored. Results A total of 102 PA cases were enrolled in this study, 51 from the ioCSF leakage group, and 51 from the non-ioCSF leakage group. 306 sagittal and 306 coronal T1CE slices were collected as the original dataset, and data augmentation was applied before model training and testing. In the test dataset, the DNN model provided a single-slice prediction accuracy of 97.29%, a sensitivity of 98.25%, and a specificity of 96.35%. In clinical test, the accuracy of the DNN model in predicting ioCSF leaks in patients reached 84.6%. The feature maps of the model were visualized and the regions of interest for prediction were the tumor roof and suprasellar region. Conclusion In this study, the DNN model could predict ioCSF leaks based on preoperative T1CE images, especially in PAs in Hardy Stages A, B, and C. The region of interest in the model prediction-making process is similar to that of humans. DNN models trained with preoperative MRI images may provide a novel tool for predicting ioCSF leak risk for PA patients.

It has been reported that a diaphragmatic defect resulting from suprasellar invasion of the PA is responsible for ioCSF leaks (Conger et al., 2018;Li et al., 2022). According to the Hardy-Wilson modified scale, tumor extrasellar extension is divided into five Stages: Stage 0, with no suprasellar extension. Stage A, tumor occupying suprasellar cistern. Stage B, recess of third ventricle obliterated. Stage C, third ventricle displaced. Stage D, intracranial extension. Stage E, intracranial extension and cavernous sinus extension. PAs in Stage 0 are associated with a low risk of diaphragmatic defect, while those in Stages D and E indicate a higher risk of diaphragmatic defect (Li et al., 2022). Previous studies using multivariate logistic regression showed that large tumors, irregular tumor contours, and tumor texture consistencies are prominent risk factors for ioCSF leaks (Karnezis et al., 2016;Patel et al., 2018;Xue et al., 2020;Li et al., 2022). These risk factors, however, cannot be quantified and used to predict ioCSF leaks in PA patients, especially those with Hardy suprasellar Stages A, B, and C.
Artificial intelligence and deep learning have revolutionized the fields of medicine and neurosurgery. Deep neural network (DNN) models can now predict various clinical tasks that would have been impossible using previous statistical models. Magnetic resonance imaging (MRI) is a non-invasive imaging technology that is used for the diagnosis and treatment of PAs routinely. MRI contains the information of tumor size, contour, and consistency of PAs, thus making it the ideal data source for DNN model training.
In this study, we investigated whether a DNN trained using only contrast-enhanced T1 (T1CE) images could predict ioCSF leakage. Additionally, the DNN model was tested in clinical settings, and model interpretability was also explored.

Patients
We reviewed 396 PA patients who underwent EETS between May 2017 and October 2022 at the Department of Neurosurgery, Jinling Hospital, retrospectively. The preoperative and postoperative MRI images and surgical records were routinely collected, and all patients were treated according to the same surgical protocol by two senior surgeons. The ethical committee of Jinling Hospital approved the study registry and data collection.
We excluded cases of PAs in Hardy suprasellar Stages 0, D, and E, as well as cases of PAs without gross-total-resection. Additionally, recurrent cases and cases with a history of medical therapy were also excluded to make sure that diaphragmatic defects were caused by tumors. Consequently, a total of 51 cases were included in the ioCSF leakage group. To investigate model interpretability, we randomly selected 51 cases with primary PAs in Hardy suprasellar Stages A, B, and C from the non-intraoperative CSF (non-ioCSF) leakage group to balance the control group. Finally, a total of 102 cases were enrolled in this study. In the clinical test, an external dataset from 26 consecutive cases with primary PA in Hardy Stages A, B, and C was tested ( Figure 1).

Tumor segmentation
IWS software (Version 1.0, Medinsightech Corp., Shanghai, China) were used for sorting and de-identification of DICOM images. DICOM images of T1CE were then converted into NIFTI images using MRI Convert software (Version 2.1, University of Oregon, Eugene, United States). T1CE NIFTI images were manually segmented by a senior radiologist (Xiang-jun Ji) using ITK-Snap software (Version 3.8, University of Pennsylvania, Philadelphia, United States). Segmentation labels for PAs were saved as tumor masks.

Image preprocessing 2.4.1. Re-sample and normalization
Axially isotropical re-sampling of MRI images to 1 mm × 1 mm × 1 mm using the B-Spline of order 1 interpolation algorithm. MRI images were normalized to reduce the inconsistency of gray-scale information in identical tissues caused by differences in equipment acquisition and scanning parameters, while still preserving the diagnostic value of gray-scale differences.

Slice selection
T1CE images with PAs were selected based on the tumor mask. For each patient, three coronal and three sagittal slices with the largest tumor size were selected. The labels of the six slices were the same as the patient's. Finally, 306 coronal and sagittal slices labeled as ioCSF Frontiers in Neuroscience 03 frontiersin.org leakage and 306 coronal and sagittal slices labeled as non-ioCSF leakage were obtained as the original dataset for further processing.

Slice resize
Since the coronal and sagittal slices of the MRI had different dimensions, the slices need to be uniformly resized so that we can use coronal and sagittal slices for model training. The size of the sagittal slice was 250*176 pixels, while the coronal slice was 188*176 pixels. In this study, both coronal and sagittal slices were uniformly resized to 192 *192 pixels.

Data augmentation
Data enhancement techniques such as exponential and logarithmic transforms, histogram equalization, power transforms, Laplace image sharpening, adding Laplace noise and left-right rotation were used. After data augmentation, a total of 4,896 slices were obtained, half with ioCSF leakage labels and the other half with non-ioCSF leakage labels.

Neural network architecture
U-Net is one of the most commonly used neural networks for biomedical image segmentation. For our prediction task, we utilized a customized U-Net architecture. Only the down-sampling part of the U-net was kept to extract features, followed by several fully connected layers and dropout layers for conclusive classification. The neural network architecture is shown in Figure 2.
The size of the input image is 192*192 pixels. The downsampling process consisted of four modules. Each module contained two convolution operations and one max pooling operation. The feature map decreased in size and the perceptual field grew larger as the depth of the neural network increased, while extracting high-level features. The last feature map passes through a sequence of three fully connected layers and the final output is produced via the softmax layer. To reduce overfitting, two dropout layers were integrated into the fully connected layers. Throughout the training process, the feature maps provided comprehensible visualization.

Model training and validation
Before model training, 400 slices from each group were randomly selected as the testing dataset. The remaining 4,096 slices were randomly shuffled and divided into 80% for model training and 20% for model validation. The hyperparameters in the model training were set as follows: using SGD as the optimizer, cross-entropy as the loss function, a learning rate of 0.0001, a batch size of 128, a momentum of 0.9, and a weight decay of 0.0001. The learning rate decreased by 5% every five epochs. The evaluation metrics of model performance included accuracy, sensitivity, and specificity. The formulas are as follows: TP TN Accuracy TP FN TN FP Flowchart of patient selection. PA, pituitary adenoma; GTR, gross total resection; and CSF, cerebrospinal fluid. where TP is the true positive, TN is the true negative, FN is the false negative and FP is the false positive. 5-fold cross-validation was also performed on the training dataset to evaluate the model performance.

DNN model workflow
The flowchart of the DNN model is depicted in Figure 3, where its main components are: (1) data preprocessing, (2) model training and validation, (3) model performance evaluation, and (4) clinical testing.

Clinical test and human-machine comparison
Our DNN model makes the prediction based on the single slice. When used to make predictions on patients, different slices from one patient may produce conflicting predictions by the DNN model. To address this problem, we introduced a slice-voting strategy that converted slice-based predictions into patient-based predictions. Three coronal and three sagittal slices containing the maximum tumor size were selected from a single patient. Data augmentation was also used to increase the number from 6 to 48 slices. Forty-eight results from 48 slices voted for the final prediction of the patient.
In clinical test, two senior experts with more than 10 years of experience in pituitary surgery predicted the occurrence of ioCSF leaks for each PA patient before surgery. The prediction accuracy of the two experts was then compared to that of the DNN model.

Statistical analysis
All statistical analysis were performed using SPSS 19.0 (IBM Corp., Armonk, New York, United States). A t-test was performed to determine the difference between two groups in Hardy's suprasellar Stages A, B, and C. p value <0.05 was considered statistically significant.

Patient information and PA characteristics
A total of 102 patients with primary PAs in Hardy suprasellar Stages A, B, and C were included in the study. In the ioCSF leakage group, there Neural network architecture. Each down-sampling module consists of two 3*3 convolution layers and a 2*2 max pooling layer, which is used for image down-sampling to extract features. After three fully connected layers, the final softmax is used as the estimation of the final class distribution. Conv, convolution; ReLU, rectifiers linear unit; FC, full connection layer.
Frontiers in Neuroscience 05 frontiersin.org were 27 female and 24 male patients. In the non-ioCSF leakage group, there were 24 female and 27 male patients. The average age for the ioCSF leakage group and the non-ioCSF leakage group were 49.0 and 49.5 years old, respectively. The average value of tumor maximum diameters in the ioCSF leakage and non-ioCSF leakage groups were 24.6 and 20.9 mm, respectively. The average volume of tumors in the ioCSF leakage and non-ioCSF leakage groups were 3.9 and 3.5 cm 3 , respectively.

Model training, validation, and testing
In this study, the model was trained for 300 epochs, and the trends of the accuracy of the validation dataset and the training loss are illustrated in Figure 4. Based on the 5-fold cross-validation, the average accuracy on the validation dataset was 97.62%, while the average sensitivity and specificity were 98.31 and 96.92%, respectively. Moreover, on the test dataset, the prediction accuracy was 97.29%, with a sensitivity of 98.25% and a specificity of 96.35%. (Table 2).

Clinical test and human-machine comparison test
In the clinical test, an external 26 patients with primary PAs in Hardy suprasellar Stages A, B, and C were tested consecutively. Four out of 26 patients observed ioCSF leaks in surgery. The prediction accuracies of Expert 1 and Expert 2 were 53.85% and 42.31%, respectively. The DNN model achieved a prediction accuracy of 84.46%, which is more than 30% higher than that of the two experts (Table 3).

Interpretability of the DNN model
To investigate the interpretability of the DNN model, feature map visualization was generated during the whole model training process  Accuracy curve (A) and loss curve (B) of the model during training. acc, accuracy; sen, sensitivity; and spe, specificity. ( Figure 5). Heatmaps were applied to highlight the regions of interest for the DNN model during the prediction-making process. Figure 6 shows that the DNN model focuses on the upper part of the tumor and the suprasellar regions, which contain diaphragmatic information.
Similarly, the DNN model focused on the same regions as the experts when predicting the occurrence of ioCSF leaks.

Discussion
Endoscopic transsphenoidal surgery currently serves as the gold standard for treating PAs. However, a considerable number of patients suffer from postoperative CSF leaks following ioCSF leaks (Mehta and Oldfield, 2012;Karnezis et al., 2016;Mooney et al., 2017). As a key predictor of postoperative CSF rhinorrhea, ioCSF leaks have an incidence rate of 2.8-41% during transsphenoidal procedures   (Mehta and Oldfield, 2012;Karnezis et al., 2016;Chen et al., 2017;Przybylowski et al., 2017;Strickland et al., 2017;Patel et al., 2018). In this study, 61 cases out of 261 cases (24.1%) were observed with ioCSF leaks. Significant risk factors for ioCSF leaks in EETS include tumor consistency, tumor size, and repetitive surgery (Karnezis et al., 2016;Xue et al., 2020;Li et al., 2022). To ensure that factors other The flow chart of the neural network converting the input image into the feature map, and the visualization of the feature map of different downsampling modules. The input image size is 192*192 pixels. Each down-sample block halves the size of the image and doubles the number of feature channels. The fully connected layer and softmax layer calculated class scores based on the final feature map to predict ioCSF leaks.

FIGURE 6
The heatmaps in the sagittal and coronal planes show the hotspot regions that the DNN model focused when making prediction of ioCSF leaks.
Frontiers in Neuroscience 08 frontiersin.org than the tumor itself, such as radiotherapy, drugs, and repeat surgery, were not involved, only primary PA cases were included in this study.
To minimize surgical effects on ioCSF leaks, two senior neurosurgeons with more than 10 years of pituitary surgery experience performed all EETS procedures using the same surgical protocol. Previous studies that utilized the multivariate logistic regression method revealed that tumor size, texture, irregular upper tumor contour, gonadotrophic-positive staining, and higher BMI in patients were risk factors for ioCSF leaks (Karnezis et al., 2016;Zhou et al., 2017;Xue et al., 2020;Li et al., 2022). However, traditional statistical models cannot quantitatively analyze these risk factors and provide a straightforward prediction when applied to a single PA patient. Machine learning models can achieve much higher prediction performance than logistic regression models. Staartjes et al. (2019) used patients' sex, age, tumor diameter, tumor volume, Knosp grade, and Hardy grade as input features to train their neural network model, which provided a prediction accuracy of 90% on the validation dataset and 88% on the testing dataset. In a comparison of the model performances of machine learning (Random Forest) and conventional statistical methods (multivariable logistic regression), Mattogno et al. showed that machine learning models had a 30% higher prediction accuracy than conventional statistical models (Qian et al., 2020). In this study, we utilized T1CE images instead of clinical features to train our DNN model. The reasons are as follows: Firstly, T1CE images contain a number of risk factors for ioCSF leaks, including tumor size, shape, and texture. Secondly, T1CE images are more objective than clinical features extracted by clinicians. Finally, in clinical settings, MRI images are more convenient to use as input data for DNN models than models based on multiple features.
In this study, only primary PAs in Hardy Stages A, B, and C were included. The reasons are as follows: first, the prediction of ioCSF leaks for PAs in Hardy Stages A, B and C was challenging, even for experts. Second, PAs in Hardy Stages A, B, and C are the most common types encountered in clinical practice. Finally, limiting the spectrum of PA cases would reduce the long tail effect, simplify model training, and be helpful in investigating the model's interpretability.
We used a customized U-Net architecture as our neural network. The U-Net is frequently adopted for semantic segmentation in the field of biomedical imaging. Semantic segmentation involves the need for pixel-level discrimination and the transfer of learned features from various encoder stage to each pixel. Unlike semantic segmentation, our objective was to predict the occurrence of ioCSF leaks in EETS for a given PA. We exclude the upsampling component of U-Net because restoring image resolution through upsampling is redundant and may lead to model overfitting.
Three consecutive slices were selected, containing the largest tumor size in both sagittal and coronal views, in order to incorporate more comprehensive information regarding tumor size, texture, and upper contour. Data augmentation can expand the training data and improve the generalization ability of the model. The implementation of data augmentation significantly improved the performance of our DNN model on both validation and testing datasets, with an average accuracy rate of 97.62% on the validation dataset and 97.29% on the testing dataset.
In the clinical test, a slice-voting strategy algorism was used to make the DNN model could make direct predictions on PA patients. Twenty-six patients with primary PAs in Hardy Stages A, B, and C were tested preoperatively. The DNN model gave a prediction accuracy of 84.61%. Compared with the result on the test dataset, it had dropped by 12%, but it was 30% higher than the accuracy predicted by the two experts.
Deep learning is often referred to as a "black box" algorithm because it is not clear to users how the models extract features to make the final predictions. To explore model interpretability, we restricted data enrollment criteria (only PAs in Hardy Stages A, B, and C were included in this study) and performed data balance between the two groups (random selection 51 cases in the control group). Additionally, feature visualization techniques were deployed to observe feature maps from image input to result output, and heatmaps were used to show the regions of interest that the DNN model focused on during prediction. Interestingly, the heatmaps showed that the hotspots were in the tumor roof and suprasellar regions, which coincided with human experts' focus on predicting ioCSF leaks.

Limitations
Although our DNN model demonstrated excellent performance, the results were based on a small dataset of a single MR Scanner in a single center, which may have limited the reliability of our model in an external patient population with different MR Scanners and different centers. However, this pilot study demonstrates the feasibility of the DNN model to predict ioCSF leaks for PA patients preoperatively.

Conclusion
Preoperative prediction of ioCSF leakage in patients with PAs during EETS remains challenging, especially for PAs in Hardy Stages A, B, and C. We trained a DNN model using T1CE images which provided a slice-based accuracy of 97.29% on the test dataset, and a patient-based accuracy of 84.61% in the clinical test. This pilot study demonstrates the feasibility of the DNN model in predicting ioCSF leaks in PA patients before surgery.

Data availability statement
The data analyzed in this study is subject to the following licenses/restrictions: Dataset not available due to ethical restrictions. Requests to access these datasets should be directed to X-JS, shukelson@msn.com.
Author contributions X-JS and S-BC: conceptualization and supervision. KZ, JQ, and X-JJ: data curation. HC, KZ, and X-JS: formal analysis. HC and KZ: Frontiers in Neuroscience 09 frontiersin.org